Bayesian entropy estimation for countable discrete distributions
نویسندگان
چکیده
We consider the problem of estimating Shannon’s entropy H from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it also provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over H can be computed analytically. We derive formulas for the posterior mean (Bayes’ least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over H, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous mixing measures such that the resulting mixture of Pitman-Yor processes produces an approximately flat prior over H. We show that the resulting Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of distributions. We explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.
منابع مشابه
Bayesian Estimation of Shift Point in Shape Parameter of Inverse Gaussian Distribution Under Different Loss Functions
In this paper, a Bayesian approach is proposed for shift point detection in an inverse Gaussian distribution. In this study, the mean parameter of inverse Gaussian distribution is assumed to be constant and shift points in shape parameter is considered. First the posterior distribution of shape parameter is obtained. Then the Bayes estimators are derived under a class of priors and using variou...
متن کاملDetermination of Maximum Bayesian Entropy Probability Distribution
In this paper, we consider the determination methods of maximum entropy multivariate distributions with given prior under the constraints, that the marginal distributions or the marginals and covariance matrix are prescribed. Next, some numerical solutions are considered for the cases of unavailable closed form of solutions. Finally, these methods are illustrated via some numerical examples.
متن کاملBayesian estimation of discrete entropy with mixtures of stick-breaking priors
We consider the problem of estimating Shannon’s entropyH in the under-sampled regime, where the number of possible symbols may be unknown or countably infinite. Dirichlet and Pitman-Yor processes provide tractable prior distributions over the space of countably infinite discrete distributions, and have found major applications in Bayesian non-parametric statistics and machine learning. Here we ...
متن کاملOn the lower limits of entropy estimation
In recent paper, Antos and Kontoyiannis [1] considered the problem of estimating the entropy of a countably infinite discrete distribution from independent identically distributed observations. They left several open problems regarding the convergence rates of entropy estimates, which we consider here. Our first result, is that the plug-in estimate of entropy is as efficient as a match length e...
متن کاملShannon Entropy Estimation in $\infty$-Alphabets from Convergence Results
The problem of Shannon entropy estimation in countable infinite alphabets is revisited from the adoption of convergence results of the entropy functional. Sufficient conditions for the convergence of the entropy are used, including scenarios with both finitely and infinitely supported distributions. From this angle, four plug-in histogram-based estimators are studied showing strong consistency ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 15 شماره
صفحات -
تاریخ انتشار 2014